Introducing the iso3c_kr Function in the kdiplo R Package for Converting Korean Country Names into iso3c Country Codes
Korea
Korean dataset
dataset
R
R package
diplomacy
kdiplo
Author
Kadir Jun Ayhan
Published
Tuesday, April 23, 2024
In my research, I often work with country-year data from Korean sources, including data on diplomatic visits, trade, aid and so on. One of the fundamental difficulties I have had is the lack of universal country codes across different datasets. Further complicating matters is the inconsistency of country names in these datasets. For example, Democratic Republic of the Congo has five different spellings across different official sources that I could find: 콩고 민주공화국, 자이르, 콩고민주공화국, 콩고 민주 공화국, 콩고민주공화국(DR콩고).
To address this issue, I have created a function in my kdiplo package that converts Korean country names into ISO 3166-1 alpha-3 (iso3c) country codes. This function, iso3c_kr, is designed to assign universal iso3c country codes to Korean-language country names that will make it easier to join different kinds of data.
One still needs to check if the output is correct, especially for countries that have gone through political transitions such as Germany, Yugoslavia, Russia, Vietnam, Yemen and so on.
Sometimes the Korean government sources have overlapping data for Yugoslavia and Serbia, for example. In such cases, one needs to check the data and make sure that the data is correct.
Wide format is quite common in official Korean data sources. Trade data is in wide format. Before using the iso3c_kr function, let’s first transform the trade data into a long (country-year) format to make it in the same format as the aid data. This will make joining the two datasets more feasible.
Code
export<-tradeimport<-tradeexport%<>%select(-`...63`)export_long<-export%>%pivot_longer(4:62, names_to ="year", values_to ="export_kosis")export_long%<>%set_names(c("country_kr", "type", "unit", "year", "export_kosis"))export_long%<>%filter(type=="수출액[천달러]")%>%mutate(export_kosis =as.numeric(export_kosis)*1000, year =parse_number(year))%>%select(-type, -unit)import%<>%select(-`...63`)import_long<-import%>%pivot_longer(4:62, names_to ="year", values_to ="import_kosis")import_long%<>%set_names(c("country_kr", "type", "unit", "year", "import_kosis"))import_long%<>%filter(type=="수입액[천달러]")%>%mutate(import_kosis =as.numeric(import_kosis)*1000, year =parse_number(year))%>%select(-type, -unit)trade<-export_long%>%left_join(import_long, by =c("country_kr", "year"))
Using the iso3c_kr function, we can simply convert Korean country names into iso3c country codes. For example, the following is the output of the iso3c_kr function for the Korean trade data:
Code
trade<-iso3c_kr(trade, "country_kr")#you copy paste the column name that has the Korean country names.trade[c(50, 150, 250, 350, 450, 550), c(1,5, 2:4)]%>%gt::gt()
country_kr
iso3c
year
export_kosis
import_kosis
계
NA
2014
572664607000
525514506000
아랍에미리트 연합
ARE
1996
1377933000
2259205000
앤티가바부다
ATG
1978
NA
NA
앵귈라
AIA
2019
817000
1000
아르메니아
ARM
2001
1255000
43000
앙골라
AGO
1983
235000
NA
We see that in this example, “계” (gyae) did not get any iso3c country code. This is because the iso3c_kr function could not find the iso3c country code for this entry. This is because, it is not a country name. “계” means total. It is best to check the data to see which entries did not get an iso3c code.
They mean “total”, “IMF”, “other”, and “other countries” in Korean. In other words, we are not missing any countries, which is good.
Now let’s convert the Korean country names in the aid data into iso3c country codes:
Code
aid%<>%set_names(c("country_kr", "sector", "no_of_projects", "aid_usd", "aid_krw"))aid<-iso3c_kr(aid, "country_kr")#you copy paste the column name that has the Korean country names.aid[c(50, 150, 250, 350, 450, 550),c(1, 6, 2:5)]%>%gt::gt()
country_kr
iso3c
sector
no_of_projects
aid_usd
aid_krw
베트남
VNM
통신정책, 계획 및 행정(voluntary code)
2
232334
270736486
캄보디아
KHM
11321
1
85815
99999361
미얀마
MMR
사회보호/보장
1
103460
120560903
라오스
LAO
비정규 농업훈련
1
107958
125802378
몽골
MNG
의료서비스
5
511824
596423389
필리핀
PHL
농업용수자원
2
0
0
Once you know the iso3c country codes, you can get the English country names, or other country codes (such as Correlates of War country codes) using the countrycode package, for example.
More importantly, this function allows users to be able to join different datasets that have Korean country names. For example, one can join the trade data with the aid data using the iso3c country codes. In this example, I will join the trade data with the aid data using the iso3c country codes.
Code
# now that I think about it, this sample data is only 2019.aid$year<-2019trade_aid<-trade%>%left_join(aid, by =c("iso3c", "year"), suffix =c("", "_aid"))trade_aid%>%filter(year==2019&!is.na(iso3c))%>%slice(c(50, 150, 250, 350, 450, 550))%>%select(c(1, 5, 6, 2:4, 8, 10))%>%gt::gt()
country_kr
iso3c
country_name
year
export_kosis
import_kosis
sector
aid_usd
아르메니아
ARM
Armenia
2019
12729000
16743000
전문대,대학(원) 교육
119069
방글라데시
BGD
Bangladesh
2019
1282342000
404703000
건설정책 및 행정관리
46251
볼리비아
BOL
Bolivia
2019
30434000
450576000
환경정책 및 행정관리
80969
코트디부아르
CIV
Côte d’Ivoire
2019
136494000
5264000
교육정책 및 행정관리
30096
콜롬비아
COL
Colombia
2019
1143075000
718214000
성인 기초생활교육
62976
알제리
DZA
Algeria
2019
700918000
1746239000
레크리에이션 및 스포츠(voluntary code)
22312
Voilà! Now we have a dataset that has both trade and aid data, both of which originally did not have consistent country names or country codes. I plan to add warning messages to the iso3c_kr function to make it easier to spot potential issues with the conversion of Korean country names. I will continue to update the Korean country name dataset in the kdiplo package as I come across new data sources. Feel free to report unavailable country names in the iso3c_kr function to me using the issue tracker on Github.
Source Code
---title: "Working with Korean Diplomatic Datasets" subtitle: "Introducing the `iso3c_kr` Function in the `kdiplo` R Package for Converting Korean Country Names into iso3c Country Codes"author: "Kadir Jun Ayhan"format: html: embed-resources: true code-fold: true code-summary: "Show the code" code-tools: truedate: "2024-04-23"editor: visualecho: truewarning: falseimage: hanguel.pngdraft: falsecomments: hypothesis: truecategories: - Korea - Korean dataset - dataset - R - R package - diplomacy - kdiplo---```{r}#| include: falselibrary(kdiplo)library(tidyverse)library(magrittr)```In my research, I often work with country-year data from Korean sources, including data on diplomatic visits, trade, aid and so on. One of the fundamental difficulties I have had is the lack of universal country codes across different datasets. Further complicating matters is the inconsistency of country names in these datasets. For example, Democratic Republic of the Congo has five different spellings across different official sources that I could find: `r paste(unique(kdiplo::iso3c_data$country_kr[kdiplo::iso3c_data$iso3c == "COD"]), collapse = ", ")`.To address this issue, I have created a function in my `kdiplo` package that converts Korean country names into ISO 3166-1 alpha-3 (*iso3c*) country codes. This function, `iso3c_kr`, is designed to assign universal iso3c country codes to Korean-language country names that will make it easier to join different kinds of data.One still needs to check if the output is correct, especially for countries that have gone through political transitions such as Germany, Yugoslavia, Russia, Vietnam, Yemen and so on.Sometimes the Korean government sources have overlapping data for Yugoslavia and Serbia, for example. In such cases, one needs to check the data and make sure that the data is correct.For example, the following is sample Korean trade data from [Korean Statistical Information Service (KOSIS)](https://kosis.kr/statHtml/statHtml.do?orgId=360&tblId=DT_1R11006_FRM101&conn_path=I3):```{r}trade <- readxl::read_xlsx("../../../korea_visits/data/kosis_trade_240330.xlsx")trade[533:538,c(1,57:62)] %>% gt::gt()```And, the following is sample Korean aid data from [Korea's ODA portal](https://stats.odakorea.go.kr/portal/odakorea/detail):```{r}aid <- readxl::read_xlsx("../../../covid determinants 220818/data/korea_total_aid_2019_230709.xlsx")aid %<>%select(1:5)aid[c(50, 150, 250, 350, 450),] %>% gt::gt()```Wide format is quite common in official Korean data sources. Trade data is in wide format. Before using the `iso3c_kr` function, let's first transform the trade data into a long (country-year) format to make it in the same format as the aid data. This will make joining the two datasets more feasible.```{r}export <- tradeimport <- tradeexport %<>%select(-`...63`)export_long <- export %>%pivot_longer(4:62, names_to ="year", values_to ="export_kosis")export_long %<>%set_names(c("country_kr", "type", "unit", "year", "export_kosis"))export_long %<>%filter(type =="수출액[천달러]") %>%mutate(export_kosis =as.numeric(export_kosis) *1000,year =parse_number(year)) %>%select(-type, -unit)import %<>%select(-`...63`)import_long <- import %>%pivot_longer(4:62, names_to ="year", values_to ="import_kosis")import_long %<>%set_names(c("country_kr", "type", "unit", "year", "import_kosis"))import_long %<>%filter(type =="수입액[천달러]") %>%mutate(import_kosis =as.numeric(import_kosis) *1000,year =parse_number(year)) %>%select(-type, -unit)trade <- export_long %>%left_join(import_long, by =c("country_kr", "year"))```Using the `iso3c_kr` function, we can simply convert Korean country names into iso3c country codes. For example, the following is the output of the `iso3c_kr` function for the Korean trade data:```{r}trade <-iso3c_kr(trade, "country_kr") #you copy paste the column name that has the Korean country names.trade[c(50, 150, 250, 350, 450, 550), c(1,5, 2:4)] %>% gt::gt()```We see that in this example, "계" (*gyae*) did not get any iso3c country code. This is because the `iso3c_kr` function could not find the iso3c country code for this entry. This is because, it is not a country name. "계" means total. It is best to check the data to see which entries did not get an iso3c code.```{r}missing_iso3c <- trade %>%filter(is.na(iso3c)) %>%pull(country_kr) %>%unique()paste(missing_iso3c, collapse =", ")```They mean "total", "IMF", "other", and "other countries" in Korean. In other words, we are not missing any countries, which is good.Now let's convert the Korean country names in the aid data into iso3c country codes:```{r}aid %<>%set_names(c("country_kr", "sector", "no_of_projects", "aid_usd", "aid_krw"))aid <-iso3c_kr(aid, "country_kr") #you copy paste the column name that has the Korean country names.aid[c(50, 150, 250, 350, 450, 550),c(1, 6, 2:5)] %>% gt::gt()```Once you know the iso3c country codes, you can get the English country names, or other country codes (such as Correlates of War country codes) using the `countrycode` package, for example.```{r}trade <- trade %>%mutate(country_name = countrycode::countrycode(iso3c, origin ="iso3c", destination ="country.name"))trade[c(50, 150, 250, 350, 450, 550),c(1, 5, 6, 2:4)] %>% gt::gt()```More importantly, this function allows users to be able to join different datasets that have Korean country names. For example, one can join the trade data with the aid data using the iso3c country codes. In this example, I will join the trade data with the aid data using the iso3c country codes.```{r}# now that I think about it, this sample data is only 2019.aid$year <-2019trade_aid <- trade %>%left_join(aid, by =c("iso3c", "year"), suffix =c("", "_aid"))trade_aid %>%filter(year ==2019&!is.na(iso3c)) %>%slice(c(50, 150, 250, 350, 450, 550)) %>%select(c(1, 5, 6, 2:4, 8, 10)) %>% gt::gt()```Voilà! Now we have a dataset that has both trade and aid data, both of which originally did not have consistent country names or country codes. I plan to add warning messages to the `iso3c_kr` function to make it easier to spot potential issues with the conversion of Korean country names. I will continue to update the Korean country name dataset in the `kdiplo` package as I come across new data sources. Feel free to report unavailable country names in the `iso3c_kr` function to me using the [issue tracker on Github](https://github.com/kjayhan/kdiplo/issues).